Shift of pairwise similarities for data clustering

نویسندگان

چکیده

Abstract Several clustering methods (e.g., Normalized Cut and Ratio ) divide the Min cost function by a cluster dependent factor size or degree of clusters), in order to yield more balanced partitioning. We, instead, investigate adding such regularizations original function. We first consider case where regularization term is sum squared clusters, then generalize it adaptive pairwise similarities. This leads shifting (adaptively) similarities which might make some them negative. study connection this method Correlation Clustering propose an efficient local search optimization algorithm with fast theoretical convergence rate solve new problem. In following, we shift on common methods, finally, demonstrate superior performance extensive experiments different datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost functions for pairwise data clustering

Cost functions for non-hierarchical pairwise clustering are introduced, in the probabilistic autoencoder framework, by the request of maximal average similarity between input and the output of the autoencoder. Clustering is thus formulated as the problem of finding the ground state of Potts spins Hamiltonians. The partition, provided by this procedure, identifies clusters with dense connected r...

متن کامل

Pairwise Data Clustering by Deterministic Annealing

Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness pro...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Context clustering for Word Sense Disambiguation based on modeling pairwise context similarities

Traditionally, word sense disambiguation (WSD) involves a different context model for each individual word. This paper presents a new approach to WSD using weakly supervised learning. Statistical models are not trained for the contexts of each individual word, but for the similarities between context pairs at category level. The insight is that the correlation regularity between the sense disti...

متن کامل

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06189-6